Using the Visual Denotations of Image Captions for Semantic Inference

نویسندگان

Gerald DeJong

Dan Roth

چکیده

Semantic inference is essential to natural language understanding. There are two different traditional approaches to semantic inference. The logic-based approach translates utterances into a formal meaning representation that is amenable to logical proofs. The vector-based approach maps words to vectors that are based on the contexts in which the words appear in utterances. Real-valued similarities are used in place of logical inferences. We introduce the notion of the visual denotation of an utterance, which is the set of images that it describes. This notion borrows the abstract concept of a denotation of an utterance as the set of possible worlds in which the utterance is true from the logic-based approach, and instantiates possible worlds as images. In this dissertation, we also show how visual denotations can be created for descriptions of everyday entities and events. Additionally, we demonstrate that visual denotations can be used as a new model of semantic similarity, and that this model is better at identifying entailment relations between descriptions of images than traditional distributional similarities. In order to do this, we create an image caption corpus consisting of captions and images depicting everyday actions. This corpus has a number of useful features that would assist in investigating everyday events and the different ways in which they can be described. We use the captions in the corpus as the starting point for producing caption fragments with larger visual denotations. We accomplish that by creating a denotation graph, a subsumption hierarchy over the captions that links captions and the images that depict them, that also allows for the visualization and navigation of the image caption corpus in an intuitive manner.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions

We propose to use the visual denotations of linguistic expressions (i.e. the set of images they describe) to define novel denotational similarity metrics, which we show to be at least as beneficial as distributional similarities for two tasks that require semantic inference. To compute these denotational similarities, we construct a denotation graph, i.e. a subsumption hierarchy over constituen...

متن کامل

SEIMCHA: a new semantic image CAPTCHA using geometric transformations

As protection of web applications are getting more and more important every day, CAPTCHAs are facing booming attention both by users and designers. Nowadays, it is well accepted that using visual concepts enhance security and usability of CAPTCHAs. There exist few major different ideas for designing image CAPTCHAs. Some methods apply a set of modifications such as rotations to the original imag...

متن کامل

Using Text Surrounding Method to Enhance Retrieval of Online Images by Google Search Engine

Purpose: the current research aimed to compare the effectiveness of various tags and codes for retrieving images from the Google. Design/methodology: selected images with different characteristics in a registered domain were carefully studied. The exception was that special conceptual features have been apportioned for each group of images separately. In this regard, each group image surr...

متن کامل

Enhanced Sports Image Annotation and Retrieval Based Upon Semantic Analysis of Multimodal Cues

This paper presents a framework for semi-automatic annotation and semantic image retrieval, applied to the sports domain, based upon semantic analysis of both image text captions and visual features of the image. Unstructured text captions of images are analysed in order to extract the concepts and restructure them into a semantic model. SVM classification of the multi-dominant colours and edge...

متن کامل

Mapping between image regions and caption concepts of captioned depictive photographs

We discuss the obstacles to inference of correspondences between objects within photographic images and their counterpart concepts in descriptive captions of those images. This is important for information retrieval of photographic data since its content analysis is much arder than linguistic analysis of its captions. We argue that the key mapping is between certain caption concepts representin...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Using the Visual Denotations of Image Captions for Semantic Inference

نویسندگان

چکیده

منابع مشابه

From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions

SEIMCHA: a new semantic image CAPTCHA using geometric transformations

Using Text Surrounding Method to Enhance Retrieval of Online Images by Google Search Engine

Enhanced Sports Image Annotation and Retrieval Based Upon Semantic Analysis of Multimodal Cues

Mapping between image regions and caption concepts of captioned depictive photographs

عنوان ژورنال:

اشتراک گذاری